69 research outputs found
Online Sub-Sampling for Reinforcement Learning with General Function Approximation
Designing provably efficient algorithms with general function approximation
is an important open problem in reinforcement learning. Recently, Wang et
al.~[2020c] establish a value-based algorithm with general function
approximation that enjoys
\footnote{Throughout the paper, we
use to suppress logarithm factors. } regret bound, where
depends on the complexity of the function class, is the planning
horizon, and is the total number of episodes. However, their algorithm
requires computation time per round, rendering the algorithm
inefficient for practical use. In this paper, by applying online sub-sampling
techniques, we develop an algorithm that takes
computation time per round on average, and
enjoys nearly the same regret bound. Furthermore, the algorithm achieves low
switching cost, i.e., it changes the policy only
times during its execution, making it
appealing to be implemented in real-life scenarios. Moreover, by using an
upper-confidence based exploration-driven reward function, the algorithm
provably explores the environment in the reward-free setting. In particular,
after rounds of exploration, the
algorithm outputs an -optimal policy for any given reward function
DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation
3D perception based on the representations learned from multi-camera
bird's-eye-view (BEV) is trending as cameras are cost-effective for mass
production in autonomous driving industry. However, there exists a distinct
performance gap between multi-camera BEV and LiDAR based 3D object detection.
One key reason is that LiDAR captures accurate depth and other geometry
measurements, while it is notoriously challenging to infer such 3D information
from merely image input. In this work, we propose to boost the representation
learning of a multi-camera BEV based student detector by training it to imitate
the features of a well-trained LiDAR based teacher detector. We propose
effective balancing strategy to enforce the student to focus on learning the
crucial features from the teacher, and generalize knowledge transfer to
multi-scale layers with temporal fusion. We conduct extensive evaluations on
multiple representative models of multi-camera BEV. Experiments reveal that our
approach renders significant improvement over the student models, leading to
the state-of-the-art performance on the popular benchmark nuScenes.Comment: ICCV 202
Onfocus detection:Identifying individual-camera eye contact from unconstrained images
Onfocus detection aims at identifying whether the focus of the individual
captured by a camera is on the camera or not. Based on the behavioral research,
the focus of an individual during face-to-camera communication leads to a
special type of eye contact, i.e., the individual-camera eye contact, which is
a powerful signal in social communication and plays a crucial role in
recognizing irregular individual status (e.g., lying or suffering mental
disease) and special purposes (e.g., seeking help or attracting fans). Thus,
developing effective onfocus detection algorithms is of significance for
assisting the criminal investigation, disease discovery, and social behavior
analysis. However, the review of the literature shows that very few efforts
have been made toward the development of onfocus detector due to the lack of
large-scale public available datasets as well as the challenging nature of this
task. To this end, this paper engages in the onfocus detection research by
addressing the above two issues. Firstly, we build a large-scale onfocus
detection dataset, named as the OnFocus Detection In the Wild (OFDIW). It
consists of 20,623 images in unconstrained capture conditions (thus called ``in
the wild'') and contains individuals with diverse emotions, ages, facial
characteristics, and rich interactions with surrounding objects and background
scenes. On top of that, we propose a novel end-to-end deep model, i.e., the
eye-context interaction inferring network (ECIIN), for onfocus detection, which
explores eye-context interaction via dynamic capsule routing. Finally,
comprehensive experiments are conducted on the proposed OFDIW dataset to
benchmark the existing learning models and demonstrate the effectiveness of the
proposed ECIIN. The project (containing both datasets and codes) is at
https://github.com/wintercho/focus
Cross-Modality High-Frequency Transformer for MR Image Super-Resolution
Improving the resolution of magnetic resonance (MR) image data is critical to
computer-aided diagnosis and brain function analysis. Higher resolution helps
to capture more detailed content, but typically induces to lower
signal-to-noise ratio and longer scanning time. To this end, MR image
super-resolution has become a widely-interested topic in recent times. Existing
works establish extensive deep models with the conventional architectures based
on convolutional neural networks (CNN). In this work, to further advance this
research field, we make an early effort to build a Transformer-based MR image
super-resolution framework, with careful designs on exploring valuable domain
prior knowledge. Specifically, we consider two-fold domain priors including the
high-frequency structure prior and the inter-modality context prior, and
establish a novel Transformer architecture, called Cross-modality
high-frequency Transformer (Cohf-T), to introduce such priors into
super-resolving the low-resolution (LR) MR images. Comprehensive experiments on
two datasets indicate that Cohf-T achieves new state-of-the-art performance
Optimizing Error-Bounded Lossy Compression for Three-Dimensional Adaptive Mesh Refinement Simulations
Today's scientific simulations require a significant reduction of data volume
because of extremely large amounts of data they produce and the limited I/O
bandwidth and storage space. Error-bounded lossy compression has been
considered one of the most effective solutions to the above problem. However,
little work has been done to improve error-bounded lossy compression for
Adaptive Mesh Refinement (AMR) simulation data. Unlike the previous work that
only leverages 1D compression, in this work, we propose to leverage
high-dimensional (e.g., 3D) compression for each refinement level of AMR data.
To remove the data redundancy across different levels, we propose three
pre-process strategies and adaptively use them based on the data
characteristics. Experiments on seven AMR datasets from a real-world
large-scale AMR simulation demonstrate that our proposed approach can improve
the compression ratio by up to 3.3X under the same data distortion, compared to
the state-of-the-art method. In addition, we leverage the flexibility of our
approach to tune the error bound for each level, which achieves much lower data
distortion on two application-specific metrics.Comment: 13 pages, 17 figures, 3 tables, accepted by ACM HPDC 202
- …